Overview

Dataset statistics

Number of variables19
Number of observations36453
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.3 MiB
Average record size in memory152.0 B

Variable types

Numeric9
Categorical10

Alerts

Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERSHigh correlation
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUSHigh correlation
Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERSHigh correlation
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUSHigh correlation
Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERSHigh correlation
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUSHigh correlation
Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
CODE_GENDER is highly correlated with FLAG_OWN_CAR and 1 other fieldsHigh correlation
FLAG_OWN_CAR is highly correlated with CODE_GENDERHigh correlation
NAME_INCOME_TYPE is highly correlated with OCCUPATION_TYPE and 2 other fieldsHigh correlation
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERSHigh correlation
OCCUPATION_TYPE is highly correlated with CODE_GENDER and 1 other fieldsHigh correlation
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUSHigh correlation
AGE is highly correlated with NAME_INCOME_TYPEHigh correlation
YEARS_EMPLOYED is highly correlated with NAME_INCOME_TYPEHigh correlation
Unnamed: 0 is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
ID has unique values Unique
OCCUPATION_TYPE has 1241 (3.4%) zeros Zeros
YEARS_EMPLOYED has 6135 (16.8%) zeros Zeros

Reproduction

Analysis started2022-04-27 14:29:02.072889
Analysis finished2022-04-27 14:29:45.009411
Duration42.94 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct36453
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18226
Minimum0
Maximum36452
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2022-04-27T10:29:45.331540image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1822.6
Q19113
median18226
Q327339
95-th percentile34629.4
Maximum36452
Range36452
Interquartile range (IQR)18226

Descriptive statistics

Standard deviation10523.21902
Coefficient of variation (CV)0.5773740271
Kurtosis-1.2
Mean18226
Median Absolute Deviation (MAD)9113
Skewness0
Sum664392378
Variance110738138.5
MonotonicityStrictly increasing
2022-04-27T10:29:45.583312image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
243051
 
< 0.1%
242991
 
< 0.1%
243001
 
< 0.1%
243011
 
< 0.1%
243021
 
< 0.1%
243031
 
< 0.1%
243041
 
< 0.1%
243061
 
< 0.1%
242971
 
< 0.1%
Other values (36443)36443
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
364521
< 0.1%
364511
< 0.1%
364501
< 0.1%
364491
< 0.1%
364481
< 0.1%
364471
< 0.1%
364461
< 0.1%
364451
< 0.1%
364441
< 0.1%
364431
< 0.1%

ID
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct36453
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5078227.661
Minimum5008804
Maximum5150487
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2022-04-27T10:29:45.890481image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum5008804
5-th percentile5018456.2
Q15042027
median5074615
Q35115397
95-th percentile5146024.4
Maximum5150487
Range141683
Interquartile range (IQR)73370

Descriptive statistics

Standard deviation41877.01797
Coefficient of variation (CV)0.00824638452
Kurtosis-1.212733304
Mean5078227.661
Median Absolute Deviation (MAD)38094
Skewness0.08619147174
Sum1.851166329 × 1011
Variance1753684634
MonotonicityNot monotonic
2022-04-27T10:29:46.154231image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50088041
 
< 0.1%
50969941
 
< 0.1%
50969871
 
< 0.1%
50969881
 
< 0.1%
50969901
 
< 0.1%
50969911
 
< 0.1%
50969921
 
< 0.1%
50969931
 
< 0.1%
50969951
 
< 0.1%
50969821
 
< 0.1%
Other values (36443)36443
> 99.9%
ValueCountFrequency (%)
50088041
< 0.1%
50088051
< 0.1%
50088061
< 0.1%
50088081
< 0.1%
50088091
< 0.1%
50088101
< 0.1%
50088111
< 0.1%
50088121
< 0.1%
50088131
< 0.1%
50088141
< 0.1%
ValueCountFrequency (%)
51504871
< 0.1%
51504851
< 0.1%
51504841
< 0.1%
51504831
< 0.1%
51504821
< 0.1%
51504811
< 0.1%
51504801
< 0.1%
51504791
< 0.1%
51504781
< 0.1%
51504771
< 0.1%

CODE_GENDER
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size284.9 KiB
0
24429 
1
12024 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
024429
67.0%
112024
33.0%

Length

2022-04-27T10:29:46.431949image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-27T10:29:46.746637image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
024429
67.0%
112024
33.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

FLAG_OWN_CAR
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size284.9 KiB
0
22613 
1
13840 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
022613
62.0%
113840
38.0%

Length

2022-04-27T10:29:46.875014image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-27T10:29:47.010647image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
022613
62.0%
113840
38.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

FLAG_OWN_REALTY
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size284.9 KiB
1
24502 
0
11951 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
124502
67.2%
011951
32.8%

Length

2022-04-27T10:29:47.167893image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-27T10:29:47.319469image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
124502
67.2%
011951
32.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

AMT_INCOME_TOTAL
Real number (ℝ≥0)

Distinct265
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean186684.6186
Minimum27000
Maximum1575000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2022-04-27T10:29:47.535192image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum27000
5-th percentile76500
Q1121500
median157500
Q3225000
95-th percentile360000
Maximum1575000
Range1548000
Interquartile range (IQR)103500

Descriptive statistics

Standard deviation101793.4761
Coefficient of variation (CV)0.5452697544
Kurtosis17.59701591
Mean186684.6186
Median Absolute Deviation (MAD)45000
Skewness2.739006571
Sum6805214402
Variance1.036191178 × 1010
MonotonicityNot monotonic
2022-04-27T10:29:47.881193image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1350004309
 
11.8%
1800003097
 
8.5%
1575003089
 
8.5%
1125002955
 
8.1%
2250002923
 
8.0%
2025002192
 
6.0%
900001769
 
4.9%
2700001675
 
4.6%
3150001001
 
2.7%
67500873
 
2.4%
Other values (255)12570
34.5%
ValueCountFrequency (%)
270003
 
< 0.1%
292507
< 0.1%
301503
 
< 0.1%
3150016
< 0.1%
31531.53
 
< 0.1%
319501
 
< 0.1%
324005
 
< 0.1%
3330010
< 0.1%
337501
 
< 0.1%
360005
 
< 0.1%
ValueCountFrequency (%)
15750008
 
< 0.1%
13500006
 
< 0.1%
11250003
 
< 0.1%
9900004
 
< 0.1%
9450004
 
< 0.1%
90000039
0.1%
81000015
 
< 0.1%
7875005
 
< 0.1%
7650009
 
< 0.1%
7425005
 
< 0.1%

NAME_INCOME_TYPE
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size284.9 KiB
4
18815 
0
8490 
1
6152 
2
2985 
3
 
11

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4
2nd row4
3rd row4
4th row0
5th row0

Common Values

ValueCountFrequency (%)
418815
51.6%
08490
23.3%
16152
 
16.9%
22985
 
8.2%
311
 
< 0.1%

Length

2022-04-27T10:29:48.167358image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-27T10:29:48.293826image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
418815
51.6%
08490
23.3%
16152
 
16.9%
22985
 
8.2%
311
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size284.9 KiB
4
24773 
1
9864 
2
 
1410
3
 
374
0
 
32

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row4
4th row4
5th row4

Common Values

ValueCountFrequency (%)
424773
68.0%
19864
 
27.1%
21410
 
3.9%
3374
 
1.0%
032
 
0.1%

Length

2022-04-27T10:29:48.446062image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-27T10:29:48.587363image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
424773
68.0%
19864
 
27.1%
21410
 
3.9%
3374
 
1.0%
032
 
0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

NAME_FAMILY_STATUS
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size284.9 KiB
1
25048 
3
4828 
0
2945 
2
 
2100
4
 
1532

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row3
5th row3

Common Values

ValueCountFrequency (%)
125048
68.7%
34828
 
13.2%
02945
 
8.1%
22100
 
5.8%
41532
 
4.2%

Length

2022-04-27T10:29:48.748875image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-27T10:29:48.877460image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
125048
68.7%
34828
 
13.2%
02945
 
8.1%
22100
 
5.8%
41532
 
4.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

NAME_HOUSING_TYPE
Real number (ℝ≥0)

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.282912243
Minimum0
Maximum5
Zeros168
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2022-04-27T10:29:49.018222image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.9517223501
Coefficient of variation (CV)0.7418452472
Kurtosis9.494862667
Mean1.282912243
Median Absolute Deviation (MAD)0
Skewness3.29061941
Sum46766
Variance0.9057754317
MonotonicityNot monotonic
2022-04-27T10:29:49.218371image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
132544
89.3%
51776
 
4.9%
21128
 
3.1%
4575
 
1.6%
3262
 
0.7%
0168
 
0.5%
ValueCountFrequency (%)
0168
 
0.5%
132544
89.3%
21128
 
3.1%
3262
 
0.7%
4575
 
1.6%
51776
 
4.9%
ValueCountFrequency (%)
51776
 
4.9%
4575
 
1.6%
3262
 
0.7%
21128
 
3.1%
132544
89.3%
0168
 
0.5%

FLAG_WORK_PHONE
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size284.9 KiB
0
28232 
1
8221 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
028232
77.4%
18221
 
22.6%

Length

2022-04-27T10:29:49.424019image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-27T10:29:49.572022image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
028232
77.4%
18221
 
22.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

FLAG_PHONE
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size284.9 KiB
0
25706 
1
10747 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row1

Common Values

ValueCountFrequency (%)
025706
70.5%
110747
29.5%

Length

2022-04-27T10:29:49.706368image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-27T10:29:49.857209image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
025706
70.5%
110747
29.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

FLAG_EMAIL
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size284.9 KiB
0
33182 
1
 
3271

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row1

Common Values

ValueCountFrequency (%)
033182
91.0%
13271
 
9.0%

Length

2022-04-27T10:29:49.975834image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-27T10:29:50.107572image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
033182
91.0%
13271
 
9.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

OCCUPATION_TYPE
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct19
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.170712973
Minimum0
Maximum18
Zeros1241
Zeros (%)3.4%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2022-04-27T10:29:50.229066image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q16
median10
Q312
95-th percentile15
Maximum18
Range18
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.292705391
Coefficient of variation (CV)0.4680885122
Kurtosis-0.7084961249
Mean9.170712973
Median Absolute Deviation (MAD)2
Skewness-0.4382491506
Sum334300
Variance18.42731958
MonotonicityNot monotonic
2022-04-27T10:29:50.466930image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
1211323
31.1%
86211
17.0%
33591
 
9.9%
153485
 
9.6%
103012
 
8.3%
42135
 
5.9%
61383
 
3.8%
01241
 
3.4%
111207
 
3.3%
2655
 
1.8%
Other values (9)2210
 
6.1%
ValueCountFrequency (%)
01241
 
3.4%
1551
 
1.5%
2655
 
1.8%
33591
9.9%
42135
 
5.9%
585
 
0.2%
61383
 
3.8%
760
 
0.2%
86211
17.0%
9175
 
0.5%
ValueCountFrequency (%)
18173
 
0.5%
17592
 
1.6%
16151
 
0.4%
153485
 
9.6%
1479
 
0.2%
13344
 
0.9%
1211323
31.1%
111207
 
3.3%
103012
 
8.3%
9175
 
0.5%

CNT_FAM_MEMBERS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.196911091
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2022-04-27T10:29:50.654102image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum9
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8994885611
Coefficient of variation (CV)0.4094333015
Kurtosis1.229319259
Mean2.196911091
Median Absolute Deviation (MAD)0
Skewness0.907514612
Sum80084
Variance0.8090796716
MonotonicityNot monotonic
2022-04-27T10:29:50.810677image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
219463
53.4%
16987
 
19.2%
36421
 
17.6%
43106
 
8.5%
5397
 
1.1%
658
 
0.2%
719
 
0.1%
92
 
< 0.1%
ValueCountFrequency (%)
16987
 
19.2%
219463
53.4%
36421
 
17.6%
43106
 
8.5%
5397
 
1.1%
658
 
0.2%
719
 
0.1%
92
 
< 0.1%
ValueCountFrequency (%)
92
 
< 0.1%
719
 
0.1%
658
 
0.2%
5397
 
1.1%
43106
 
8.5%
36421
 
17.6%
219463
53.4%
16987
 
19.2%

AGE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7182
Distinct (%)19.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.73850772
Minimum20.50418558
Maximum68.86383704
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2022-04-27T10:29:51.065068image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum20.50418558
5-th percentile27.03409379
Q134.11979712
median42.61004675
Q353.2194364
95-th percentile63.02388139
Maximum68.86383704
Range48.35965146
Interquartile range (IQR)19.09963928

Descriptive statistics

Standard deviation11.501045
Coefficient of variation (CV)0.2629501005
Kurtosis-1.045705537
Mean43.73850772
Median Absolute Deviation (MAD)9.377331499
Skewness0.18427999
Sum1594399.822
Variance132.2740361
MonotonicityNot monotonic
2022-04-27T10:29:51.322655image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
42.4895788454
 
0.1%
34.7057092254
 
0.1%
46.2596767938
 
0.1%
40.1568820737
 
0.1%
45.9092246932
 
0.1%
41.4519120932
 
0.1%
42.9166923332
 
0.1%
37.7502618130
 
0.1%
38.7030534530
 
0.1%
39.425860929
 
0.1%
Other values (7172)36085
99.0%
ValueCountFrequency (%)
20.504185581
 
< 0.1%
21.095573491
 
< 0.1%
21.144855812
< 0.1%
21.237944654
< 0.1%
21.791001872
< 0.1%
21.848497921
 
< 0.1%
22.015510244
< 0.1%
22.051103031
 
< 0.1%
22.056578852
< 0.1%
22.086695831
 
< 0.1%
ValueCountFrequency (%)
68.863837042
< 0.1%
68.830982163
< 0.1%
68.718727971
 
< 0.1%
68.688610991
 
< 0.1%
68.475054242
< 0.1%
68.365537962
< 0.1%
68.346372621
 
< 0.1%
68.29982823
< 0.1%
68.26149754
< 0.1%
68.212215173
< 0.1%

YEARS_EMPLOYED
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct3639
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.02440509
Minimum0
Maximum43.0207328
Zeros6135
Zeros (%)16.8%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2022-04-27T10:29:51.615572image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11.117066059
median4.249231675
Q38.632620793
95-th percentile19.72661999
Maximum43.0207328
Range43.0207328
Interquartile range (IQR)7.515554734

Descriptive statistics

Standard deviation6.480410609
Coefficient of variation (CV)1.075693037
Kurtosis3.846197692
Mean6.02440509
Median Absolute Deviation (MAD)3.586658179
Skewness1.758617789
Sum219607.6388
Variance41.99572166
MonotonicityNot monotonic
2022-04-27T10:29:51.880605image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
06135
 
16.8%
1.0979007178
 
0.2%
4.21363888464
 
0.2%
0.547581401463
 
0.2%
5.71401192461
 
0.2%
4.59420795861
 
0.2%
6.92964263556
 
0.2%
1.25943722354
 
0.1%
3.17597212853
 
0.1%
4.54766353952
 
0.1%
Other values (3629)29776
81.7%
ValueCountFrequency (%)
06135
16.8%
0.046544419123
 
< 0.1%
0.11773000131
 
< 0.1%
0.17796395552
 
< 0.1%
0.18070186251
 
< 0.1%
0.19165349054
 
< 0.1%
0.19439139751
 
< 0.1%
0.199867211517
 
< 0.1%
0.21355674651
 
< 0.1%
0.21629465361
 
< 0.1%
ValueCountFrequency (%)
43.02073281
 
< 0.1%
42.878361644
 
< 0.1%
41.690111
 
< 0.1%
41.265734413
 
< 0.1%
41.1726455716
< 0.1%
40.759221616
 
< 0.1%
40.548402778
< 0.1%
40.452576032
 
< 0.1%
39.798216254
 
< 0.1%
39.625728116
 
< 0.1%

STATUS
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size284.9 KiB
0
32164 
1
4289 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
032164
88.2%
14289
 
11.8%

Length

2022-04-27T10:29:52.135669image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-27T10:29:52.252035image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
032164
88.2%
14289
 
11.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

MONTHS_BALANCE
Real number (ℝ≥0)

Distinct61
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.16396456
Minimum0
Maximum60
Zeros315
Zeros (%)0.9%
Negative0
Negative (%)0.0%
Memory size284.9 KiB
2022-04-27T10:29:52.396333image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q112
median24
Q339
95-th percentile55
Maximum60
Range60
Interquartile range (IQR)27

Descriptive statistics

Standard deviation16.50100418
Coefficient of variation (CV)0.6306767519
Kurtosis-1.037660108
Mean26.16396456
Median Absolute Deviation (MAD)14
Skewness0.2863865912
Sum953755
Variance272.283139
MonotonicityNot monotonic
2022-04-27T10:29:52.620897image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7888
 
2.4%
11828
 
2.3%
6824
 
2.3%
8820
 
2.2%
5815
 
2.2%
17807
 
2.2%
3800
 
2.2%
10798
 
2.2%
16785
 
2.2%
15774
 
2.1%
Other values (51)28314
77.7%
ValueCountFrequency (%)
0315
 
0.9%
1551
1.5%
2643
1.8%
3800
2.2%
4765
2.1%
5815
2.2%
6824
2.3%
7888
2.4%
8820
2.2%
9770
2.1%
ValueCountFrequency (%)
60321
0.9%
59307
0.8%
58332
0.9%
57304
0.8%
56345
0.9%
55368
1.0%
54358
1.0%
53377
1.0%
52463
1.3%
51476
1.3%

Interactions

2022-04-27T10:29:41.352230image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:18.965182image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:21.689633image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:24.591579image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:27.098988image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:29.393398image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:32.689081image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:35.833987image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:39.163446image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:41.583730image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:19.452269image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:21.951926image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:24.873534image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:27.368629image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:29.663376image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:33.026675image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:36.364083image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:39.408657image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:41.829451image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:19.747150image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:22.260863image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:25.166408image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:27.626927image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:29.966457image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:33.373385image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:36.710964image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:39.720959image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:42.103803image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:20.023370image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:22.541587image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:25.471944image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:27.897420image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:30.244643image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:33.672287image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:37.108154image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:39.953166image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:42.329229image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:20.296787image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:22.811304image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:25.720911image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:28.134057image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:30.502367image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:33.992787image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:37.538930image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:40.259972image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:42.568669image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:20.577516image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:23.080554image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:26.073246image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:28.380658image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:30.866919image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:34.368160image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:38.015301image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:40.501934image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:42.812553image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:20.872505image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:23.863686image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:26.384902image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:28.630707image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:31.466379image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:34.856066image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:38.405309image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:40.712806image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:43.024545image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:21.165274image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:24.089352image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:26.609426image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:28.871988image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:31.825505image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:35.116671image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:38.690945image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:40.923336image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:43.239087image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:21.425294image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:24.324791image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:26.851777image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:29.127207image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:32.227394image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:35.470748image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:38.933211image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-27T10:29:41.137931image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-04-27T10:29:53.009847image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-04-27T10:29:53.436239image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-04-27T10:29:53.825360image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-04-27T10:29:54.181298image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-04-27T10:29:54.472452image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-04-27T10:29:43.810462image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-04-27T10:29:44.575682image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Unnamed: 0IDCODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYAMT_INCOME_TOTALNAME_INCOME_TYPENAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPEFLAG_WORK_PHONEFLAG_PHONEFLAG_EMAILOCCUPATION_TYPECNT_FAM_MEMBERSAGEYEARS_EMPLOYEDSTATUSMONTHS_BALANCE
005008804111427500.0410410012232.86857412.435574115
115008805111427500.0410410012232.86857412.435574114
225008806111112500.0441100017258.7938153.104787029
335008808001270000.0043101115152.3214038.35335404
445008809001270000.0043101115152.3214038.353354026
555008810001270000.0043101115152.3214038.353354026
665008811001270000.0043101115152.3214038.353354038
775008812001283500.0112100012161.5043430.000000020
885008813001283500.0112100012161.5043430.000000016
995008814001283500.0112100012161.5043430.000000017

Last rows

Unnamed: 0IDCODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYAMT_INCOME_TOTALNAME_INCOME_TYPENAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPEFLAG_WORK_PHONEFLAG_PHONEFLAG_EMAILOCCUPATION_TYPECNT_FAM_MEMBERSAGEYEARS_EMPLOYEDSTATUSMONTHS_BALANCE
36443364435149145111247500.044111008229.9855589.793493125
36444364445149158111247500.044111008229.9855589.793493128
36445364455149190110450000.041110113326.9601701.374429111
3644636446514972911190000.0441100012252.2967624.711938121
36447364475149775011130500.044110108244.18160525.711685119
36448364485149828111315000.0441100010247.4972116.625735111
36449364495149834001157500.0011101111233.9144543.627727123
36450364505149838001157500.0111101111233.9144543.627727132
36451364515150049001283500.0441100015249.1673341.79332919
36452364525150337101112500.044340008125.1558903.266323113